Skip to content

Clean benchmark artifacts and freeze paper snapshot#5

Merged
MaxGhenis merged 44 commits intomainfrom
codex/manifest-resume-guards
May 2, 2026
Merged

Clean benchmark artifacts and freeze paper snapshot#5
MaxGhenis merged 44 commits intomainfrom
codex/manifest-resume-guards

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

  • remove legacy v1/v2 result artifacts and generated paper scratch outputs from the tracked repo
  • add a frozen 2026-05-01 paper snapshot with hash/count tests
  • switch public CLI/docs wording to PolicyEngine reference outputs while keeping ground-truth as a compatibility alias
  • fix the full-run exporter/runbook to combine by-model chunked outputs correctly

Verification

  • uv run pytest -q
  • uv run ruff check .
  • uv run ruff format --check .
  • npm --prefix app run lint
  • npm --prefix app run build
  • uv run python paper/render_paper.py
  • git diff --check

No benchmark LLM responses were regenerated.

MaxGhenis and others added 30 commits February 25, 2026 17:49
Replace placeholder components with 4 production views:
- ScatterPlot: predicted vs actual with condition toggle
- ModelLeaderboard: ranked model comparison table
- ProgramHeatmap: variable x model accuracy grid
- ScenarioExplorer: per-household drill-down with all predictions

Remove old unused components (ModelComparison, ProgramBreakdown,
ExampleScenarios) and mock data file.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace two-column hero layout with single-column compact header
- Swap bordered stat cards for inline stat bar
- Add PE mark icon to hero and sticky nav brand
- Remove redundant sidebar (top models, preprint card)
- Remove redundant CTAs (View leaderboard, Explore households)
- Exclude public/paper/ from ESLint, add img-element directives

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Exclude generated notebook from ruff via extend-exclude
- Fix F402 shadow warnings in scenarios.py (rename loop vars)
- Wrap long strings/lines for E501 across all Python files
- Run ruff format for consistent style

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Merge hero and sticky nav into single component that collapses on scroll
- Expanded: full title, "a [PE logo] project" tagline, subtitle, stats
- Collapsed: compact bar with nav tabs, view selector, Paper link
- Smooth CSS transitions between states
- Zero duplication — one header, two modes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace binary scrolled toggle with continuous scroll progress (0→1)
- All header properties interpolate smoothly: title size, padding,
  opacity, background, nav visibility
- Change tagline from "a PE project" to "by [PE logo]"
- Uses rAF-throttled scroll listener for 60fps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wrap long help strings in cli.py and split long assertion lines in tests.
Also wraps two pre-existing E501 violations in analysis.py and test_analysis.py.

All 133 tests pass. ruff check + ruff format --check both clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
policybench Ready Ready Preview, Comment May 2, 2026 2:17am

Request Review

@MaxGhenis MaxGhenis merged commit 1f0f7cb into main May 2, 2026
4 checks passed
@MaxGhenis MaxGhenis deleted the codex/manifest-resume-guards branch May 2, 2026 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants